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Abstract — Algorithms based on multiple decoding attempts of 
Reed-Solomon (RS) codes have recently attracted new attention. 
Choosing decoding candidates based on rate-distortion theory, 
as proposed previously by the authors, currently provides the 
best performance-versus-complexity trade-off. In this paper, an 
analysis based on the rate-distortion exponent is used to directly 
minimize the exponential decay rate of the error probability. 
This enables rigorous bounds on the error probability for finite- 
length RS codes and leads to modest performance gains. As a 
byproduct, a numerical method is derived that computes the 
rate-distortion exponent for independent non-identical sources. 
Analytical results are given for errors/erasures decoding. 

I. Introduction 

The design of a computationally efficient soft-decision de- 
coding algorithm for Reed-Solomon (RS) codes has been the 
topic of significant research interest for the past several years. 
Currently, there are several soft-decision decoding algorithms 
for RS codes which exhibit a wide range of trade-offs 
between computational complexity and error performance. 

Among such decoding methods is a class of algorithms 
called multiple errors-and-erasures decoding. The algorithms 
belonging to this class first construct a set of erasure patterns 
based on the available soft information and then run an errors- 
and-erasures decoding algorithm, such as the Berlekamp- 
Massey (BM) algorithm, multiple times. Each time one 
erasure pattern in the set is used for decoding. By doing 
this, the algorithm outputs a list of candidate codewords 
and then chooses the best codeword from the list. Several 
algorithms of this type, including the popular generalized 
minimum distance (GMD) decoding algorithm, are discussed 

in ffl, El, 0, il. 

In 14^, the authors proposed a rate-distortion (RD) ap- 
proach for constructing the set of erasure patterns. The main 
idea is to choose an appropriate distortion measure so that the 
decoding is successful if and only if the distortion between 
the error pattern and erasure pattern is smaller than a fixed 
threshold. After that, a set of erasure patterns is generated 
randomly (similar to a random codebook generation) in order 
to minimize the expected minimum distortion. The approach 
was also extended to analyze multiple-decoding for decoding 
schemes beyond conventional errors-and-erasures decoding. 

This material is based upon work supported by the National Science 
Foundation under Grant No. 0802124. The work of P. Nguyen was also 
supported in part by a Vietnam Education Foundation fellowship. Any opin- 
ions, findings, conclusions, or recommendations expressed in this material 
ai'e those of the authors and do not necessarily reflect the views of the 
National Science Foundation. 



One of the drawbacks in the RD approach is that the 
mathematical framework is only valid as the block-length 
goes to infinity. Therefore, we also consider the natural 
extension to a rate-distortion exponent (RDE) approach that 
studies the behavior of the probability, pe, that the transmitted 
codeword is not on the list as a function of the block-length 
A^. The overall error probability can be approximated by pe 
because the probability that the transmitted codeword is on 
the list but not chosen is very small compared to pe. Hence, 
our new approach essentially focuses on investigating the 
exponent at which the error probability decays as goes 
to infinity. 

The proposed RDE approach can also be considered as 
the generalization of the RD approach since the latter is a 
special case of the former when the RDE function tends to 
zero. Using the RDE analysis, our proposed approach also 
helps answer the following two questions: (i) What is the 
maximum rate-distortion exponent achievable at or below a 
given number of decoding attempts (or a given size of the 
set of erasure patterns)? (ii) What is the minimum number 
of decoding attempts required to achieve a rate-distortion 
exponent at or above a given level? 

The paper is organized as follows. In Sectionjllj we review 
multiple errors-and-erasures decoding algorithms and high- 
light the connection between multiple errors-and-erasures 
decoding and rate-distortion. Then, in Section III we propose 



a RDE approach to construct a good set of erasure patterns for 
a finite length codewords. Next, we discuss how to compute 
the RDE function which is required in the proposed approach. 
Finally, simulation results are presented in Section [V] and 



conclusion is provided in Section VI 



II. Multiple errors-and-erasures Decoding 

In this section, we discuss several multiple errors-and- 
erasures decoding algorithms. While each algorithm uses a 
different set of erasure patterns, the common trend is that 
one either erases or tries several different candidates for each 
symbol in the least reliable positions (LRPs). One focuses on 
the LRPs because the hard-decision made at these positions 
are more likely to be incorrect. 

Let F„, be the Galois field with m elements denoted as 
ai,a2,...,am. We consider an {N,K) RS code of length 
and dimension K over F„,. Assume that we send a codeword 
c = (ci,C2, . . . ,cn) over some channel and r = (ri ,^2, . . . ,rM) 
is the received vector A well-known decoding threshold 
states that a single attempt of errors-and-erasures decoding 



succeeds if and only if 

2v + e<d,„i„=N-K+l (1) 

where e is number of erased symbols and v is the number of 
errors in unerased positions. A multiple errors-and-erasures 
decoding is considered to succeed if the decoding threshold 
([T]l is satisfied for at least one attempt of decoding. Intuitively, 
the best case is when one erases an error and the worst case 
is when ones wastes an erasure on a hard-decision symbol 
that turns out be correct. 

The first algorithm of this type is called Generalized 
Minimum Distance (GMD) decoding [ 1 1 where the set of 
erasure patterns is obtained by successively erasing the 
0,2,4, . . .,dmi„ — 1 LRPs (with the assumption that the mini- 
mum distance d,„i„ is odd). Recent work by Lee and Kumar 
[Pl proposes a soft-information successive (multiple) error- 
and erasure decoding (SED) which constructs the set of 
erasure patterns in a more exhaustive way. Specifically, 
SED(/,/) tries to erase all possible combinations of an even 
number less than or equal to / of positions within the I LRPs. 
The SED algorithm hence yields better performance but at 
increased complexity. 

In an attempt to answer the question how to build a 
good set of erasure patterns in terms of performance-versus- 
complexity, in |4|, we proposed a probabilistic method based 
on rate-distortion theory and random coding arguments in- 
stead of the deterministic methods which had been used in 
previously proposed algorithms. Basically, after defining 
and as an error pattern and an erasure pattern whose letters 
x,'s and i,'s are in the alphabets X and X respectively, a 
letter-by-letter distortion measure 5 : X x X ^ M>o is chosen 
properly so that the condition ([T]i can be reduced to the form 

d{x^,x^) <N-K+l (2) 

where the distortion between an error pattern and an era- 
sure pattern d(x^,x^) = TJiLi^ixiXi) is smaller than a 
fixed threshold. In general, an appropriate distortion measure 
5{j,k) for every j E X and k E X should be specified. 

Example 1: Consider a specific class of multiple errors- 
and-erasures (Berlekamp-Massey) top-^ decoding {mBM-£) 
for an positive integer £ smaller than the field size m where 
at each codeword index, up to the ^-th most likely symbols 
are taken care of. In this case, X = X = Z/+i and x^ E X^ 
where at each index /, x,- = implies that using up to the ^-th 
most likely symbols as the hard-decision all gives an error, 
Xi = j implies that the j-th most likely symbol is correct for 
j = 1,2, ...,£; x^ £ X^ where at each index /, i,- — implies 
that an erasure is applied and i, ~ k implies that the k-th most 
likely symbol is used as the hard-decision for k= l,2,...,£. 
For example, mBM-1 is the case of multiple conventional 
errors-and-erasures decoding. The letter-by-letter distortion 
measure for mBM-1 is chosen in the following way 

5(0,0) = 1 5(0,1) = 2 

5(1,0) = 1 5(1,1) = 0. 
It is also possible to choose appropriate distortion measures 
that work for ^ > 1 and other decoding schemes such as 
algebraic soft-decision (ASD) decoding. Still, the main idea 
is to convert the decoding threshold of the corresponding 



decoding scheme into the form of Q. 

Thus, by viewing x'^ (resp. x^) as a source sequence (resp. 
reproduction sequence) and choosing a suitable distortion 
measure, the task of designing a good set of erasure patterns 
turns out to be how to best approximate the source sequence 
with a minimum number of reproduction sequences. In other 
words, it becomes a covering problem where one wants to 
cover the most-likely error patterns with the fewest number 
of balls whose centers are erasure patterns. The main steps 
in the RD based algorithm are given here briefly, but more 
detail can be found in |4|. 

Step 1: Empirically compute the reliability matrix whose 
entries are Pr(c,- = Ctylr,) for / = 1,2, .. .A^ and y = 1,2, . . .m. 
From this, determine probability matrix P where pi j — 
Pr(jc,- = j) for / = 1,2, . . . and j e X. 

Step 2: Compute the RD function using P. Determine the 
test-channel input-distribution matrix Q where qij^ — Pr(i, = 
k) for / = 1,2 ... and k £ X that achieves a point on the 
RD curve corresponding to a chosen rate R. 

Step 3: Randomly generate a set B of 2^ erasure patterns 
using the distribution matrix Q in the correct reliability order 
of the codeword positions. 

Step 4: Run multiple attempts of the corresponding de- 
coding scheme using the set B to produce a list of candidate 
codewords. 

Step 5: Use Maximum-Likelihood (ML) decoding to pick 
the best codeword on the list. 
III. RATE-DISTORTION EXPONENT APPROACH 

In the RD approach, the set B of 2'^'^ (or 2^^) erasure 

patterns can be generated randomly so thajM 

I _ 
lim —E^NK!\mind(x^,x^)] <D. 

Thus, for large enough A', with high probability we have 
min^Wggc/(x^,x'^) <ND = D. Basically, [41 focuses on mini- 
mizing the average minimum distortion with little knowledge 
of how the tail of the distribution behaves. In this paper, 
we instead focus on directly minimizing the probability that 
the minimum distortion is not less than the pre-determined 
threshold D^N-K+l (due to the condition (|2|) with the 
help of an error-exponent analysis. The exact probability of 
interest is pe — Pi{x^ : mm_iN^^d{x^ ,x^) > D) that reflects 
how likely the decoding threshold ^ is going to fail. 

In other words, every error pattern x'^ can be covered by 
a sphere centered at an erasure pattern .r^ except for a set 
of error patterns of probability pe- The RDE analysis shows 
that pg decays exponentially as oo and the maximum 
exponent attainable is the RDE function. In our context, we 
have a source sequence x^ of N independent non-identical 
source components. We denote the rate-distortion exponent 
by F{R,D) using unnormalized quantities (i.e., without di- 
viding by A^) and note that exponent used by other authors in 
0, 0, Q is often the normalized version F{R,D) = 

The original RDE function F{R,D), defined in |5| for a 
single source x, is given b}|^ 

'We denote the rate and distortion by R and D, respectively, using 
unnormalized quantities, i.e., R = NR and D = ND. 
"All logarithms are taken to base 2. 



F{R,D) 



max min V p , log — 



where pj = Pr(x = j), w/^y ~ Pr(i = k\x = j), and 

>R 



Vr.d = < P 



'Lj'LkPjWk\jSjk>D J 

The RDE was first extensively discussed in Q, |l6l and 
their results show that there exists a set B of roughly 2^^ 
codewords, generated randomly using the test-channel input 
distribution matrix Q, that achieves F{R,D). This gives the 
upper bound that for every e > 0, we have 

p,<2-'^[^(^'^)-^l. (4) 
for large enough (see jS] p. 229]). An exponentially tight 
lower bound for pe can also be obtained for large enough 
(see [8i p. 236]) and this gives 

lim ~-logpe^F(R.D). 

Proposed algorithm: In the RDE approach proposed here, 
instead of computing the RD function, we need to compute 
the RDE function F{R,D) along with the optimal test- 



channel input distribution matrix Q (see Section IV i. This 
distribution serves as a replacement for the distribution used 
in Step 2 of the RD based algorithm given in the previous 
section. Apart from this, the other steps of that algorithm are 
unchanged for the proposed RDE-based algorithm. 

Remark 1: The RDE approach possesses several advan- 
tages. First, it can help one estimate the smallest number 
of decoding attempts to get to a RDE of F (or get to an 
error probability of roughly 2^^^) or, similarly, allow one to 
estimate the RDE (and error probability) for a fixed number 
of decoding attempts. Second, it provides a converse based on 
the sphere-packing bound lower bound for p^. This implies 
that, given an arbitrary set B of roughly 2^^^ erasure patterns 
and any e > 0, the probability pe cannot be made lower than 
2-N\F(R.D)+e] j-Qj. large enough. Thus, no matter how one 
chooses the set B of erasure patterns, the difference between 
the induced probability of error and the p^ for the RDE 
approach becomes negligible for large enough. 

Remark 2: It is interesting to note that the RDE approach 
for ASD decoding schemes contains the special case where 
the codebook has only one entry (i.e., ASD decoding is run 
one time). In this case, the distribution of the codebook that 
maximizes the exponent implicitly generates the optimal mul- 
tiplicity matrix. This is similar to the line of work Q, ifTOl . 
ifTTl where various researchers tried to find the multiplicity 
matrix that optimizes the error-exponent obtained by either 
applying a Chernoff bound Q, ifTOl or using Sanov's theorem 

mi- 

IV. COMPUTING THE RDE FUNCTION 

In this section, we first present an extension of Arimoto's 
numerical method for computing the RDE function |12| that 
works for any chosen discrete distortion measure. Next, we 
consider some special case where we can give an analytical 
treatment of the function. 
A. Numerical computation of RDE function 

For each discrete source component Xi, given two param- 
eters s > and f < 0, the Arimoto algorithm given in ril2< 



computes the RDE function numerically as follows. 

• Step 1: Choose an arbitrary all-positive distribution 



vector q^^^ — (^q\ ' ,q2 



(0) .(0) ^(0) 

\x\ 



• Step 2: Iterate the following steps at T = 0, 1 , . . . 

(r\ q^. I 



k\i 



Ik 



1 



1 



Lk{LjPj2-'''HAuy'+'^y 

for j €X and k€ X. 

It is shown by Arimoto that w^|^. ana q^, ' — > g'^ as 

T — )■ oo. Using the resulting and we can compute 



Pj 



k\j 



D 



kli^Jk 



(5) 

(6) 
(7) 



where p* = "^^^^^"^ . 

However, in the context we consider, the source (error 
pattern) comprises independent but not necessarily iden- 
tical source components Xj's. The complexity is a problem 
if we consider a group of source letters {ji,j2,---jN) as 
a supper-source letter J', a group of reproduction letters 
{ki,k2, ■ ■ ■ jkfi) as a super-reproduction letter /C and apply 
the Arimoto algorithm straightforwardly . Instead, we can 
avoid this computational obstacle by choosing the initial 
distribution still arbitrarily but following a factorization rule 
^Ac"* = riiLi^l^'- Then, we can verify that this factorization 
rule still holds for w^j and q)^ after every step of the 
Arimoto algorithm. This leads to 

<i^=n£iw*.|.. and <?*^ = nti<.. 

Combining with 8jic = L^i and pj ^ FI/li f;, , we 
have N 

1=1 

This gives the following proposition. 

Proposition 1: (Factored Arimoto algorithm for RDE 
function) Consider a discrete source x'^ of independent but 
non-identical source components jCi's. Given parameters s>0 
and f < 0, the exponent, rate and distortion are given by 

p\sj - E ^i.., , R\s, = E Ri\s, , = E Ai,. 



1=1 



1=1 



1=1 



where the components /^jj, , /?,|^. , , O/l^. , are computed para- 
metrically by the Arimoto algorithm. 

B. Analytical computation of RDE function 

In this subsection, we consider the case m-BMl whose 
distortion measure is given in ([3]). We study the setup that 
RS codewords defined over Galois field F,„ are transmitted 



over the m-ary symmetxic channel (m-SC) which for each 
parameter p can be modeled as 



Pr(r|c) 



p if r = c 

(l-p)/(m-l) ifr^c. 



Here, c (resp. r) is the transmitted (resp. received) symbol 
and r, c e F„,. With this channel model, we consider p not 
too small so that p > (1 — p)/ {m — I). Therefore, at each 
index / of the codeword, the hard-decision is also the received 
symbol and then it is correct with probability p. Thus, we 
have Pi I = Pr(x,- = I) = p for every index ; of the error 
pattern x^. That means, in this context we have a source 
with i.i.d. binary components Xj. Since the components 
Xi are i.i.d we can treat each x,- as a binary source X with 
Pr{X — I) ~ p and Pi{X = Q) = l— p = p and first compute 
the RDE function for this source X. 

According to IS], for any admissible {R,D) pair we can 
find two parameters s >0 and f < so that F{R,D) can be 
parametrically evaluated as 

F{R,D) = sR - St D + mux {-log f{qi)) 



= sR — stD — logminfiqi) 

<7l 



where 




and R,D are given in terms of optimizing q* which we will 
discuss later 

For the distortion measure in ^ and note that qo = 1—qi, 
we have 

f{qi)=p{(l-qi)2'+qi2^'y' + p{{l-qi)2'+qi)-' 
which is a convex function in qi. We then see that 



dqi 



0-^q\ 



1 



1 



f^t 1 1_ 



In order to minimize f{qi) over qi G [0,1], we consider 
three following cases where the optimal q*^ is either on the 
boundary or at a point with zero gradient. 

• Case 1: < p < then j3 < 0. Since / convex, it 
is non-decreasing in the interval and therefore in the 
interval [0,1]. Thus, the optimal q\ = and we can also 
compute from (|6]l, (|7]i that 

R = 0; F = = DKiiuHp) 
where in this case u = p. 

• Case 2: 1 > p> j^2'(2.'+i) '■^^'^ — 1- Since / convex, 
it is non-increasing in the interval (— oo,j3] and therefore in 
the interval [0, 1]. Thus, the optimal q*i ~ I and similarly we 



get 



R = 0; F=Dkl{u\\p) 



where in this case u = 1 — j. We can further see that D e 

[2(l-p),l] and mG [l-D,p]. 

• Case 3: ^ <P< ,^y[2.,+i) then p e (0,1). In this 



case, the optimal q^ — p. We then can find — jg- 



according to [|5] and plug in (jsj), ^ to gej^ 

R = H{u)-H[u+D-\) 
F = DKiiuWp) 



where u 



With this notation of u, we can 



2.1+1 pi+T+pi+I 

express q\ = 3,^,% and q*^ 



^^KAtt^. We can see that 

3— 2(h+D) 



3-2(m+D) 

D e (1— It can also be verified that, in this case, 
by varying s and t, u spans (1 — D, 1 — D/2) and R spans 
(0,//(l-D)). 

Based on the above analysis, we obtain the following 
lemmas and theorems. 

Lemma 1: Let h{u) — H{u) — H{u + D — 1) map u € 
[1 — D, 1 — D/2) to R. Then, the inverse mapping of h, 

h-^ : (0,//(l-D)] ^ [1 -D,l-D/2), 

is well-defined and maps R to u. 

Proof: We first notice that h{u) is strictly decreasing 
since the derivative is negative over [1 — D, 1 — D/2), hence 
the mapping h : [I - DA - D/2) ^ (0,//(l -D)] is one-to- 
one. From the analysis above, one can also see that h is 
onto. ■ 
Theorem 1: Using mBM-1 with 2^ decoding attempts 
where R £ {0,NH{1 — ^)], the maximum rate-distortion ex- 
ponent that can be achieved is 

F^NDKL{h-'{R/N)\\p). (8) 
Proof: First, note that in our context where we have a 
source sequence x^ of N i.i.d. source components, the rate 
and exponent for each source component is now ^ and ^. 
From Case 3 in the analysis above and from Lemma [T] we 
have 

F/N = Dkl{u\\p) = Dkl {h- ' [R/N) \ \ p) 

and the theorem follows. ■ 
Lemma 2: Let g{u) —Dkl{u\\p) map m € [1 — Z),/?] to F. 
Then, the inverse mapping of g, 

g-' ■.[0,Dkl{1-D\\p)]^[\~D,p] 

is well-defined and maps F to u. 

Proof: We first see that g{u) is a strictly convex function 
and achieved minimum value at u — p and therefore g{u) is 
strictly decreasing over [1 ~D,p]. Thus, the mapping g'-[l — 
D,p] [Q,Dkl{^ —D\\p)] is one-to-one. From the analysis 
above, one can also see that g is onto. ■ 
Theorem 2: In order to achieve a rate-distortion exponent 
of F e [Q,NDkl ~ F>\\p)], the minimum number of decod- 
ing attempts required for mBM-1 is 2* where 

R = N[H{g-' (F/N)) -H{g-' {F/N)+D/N - l)] + 
Proof: We also note that the rate, distortion and exponent 



for each source component is 



R D 

N' N 



and ^ respectively. 



Combining all the cases in the above analysis, we have 

R/N = [H {g-' (F/N)) ~H{g-' {F/N)+D/N-\)]^ 

^The binary entropy function is H{u) = — wlogn — (1 — u)log(l — u) and 
the Kullback-Leibler divergence is Dkl(u\ \p) = ulog + (1 — h) log 
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Figure 1. Performance of various decoding algoritlims for the (255,239) 
RS code over an AWGN cliannel. 

and the theorem follows. ■ 

V. SIMULATION 

Simulations of the proposed algorithm were conducted 
for the (255,239) RS code over an AWGN channel with 
BPSK as the modulation format. In Fig. [T] the mBM- 
2(RD,11) curve belongs to the algorithm mBM-2 using RD 
approach proposed in (4] while the mBM-2(RDE,l 1) one 
corresponds to the algorithm mBM-2 using RDE approach 
proposed in this paper. The label SED(12,12) denotes the 
algorithm presented in f2^. While all these three algorithms 
use the same number of 2" erasure patterns, at a PER of 
lO^'*, the mBM2(RDE,ll) provides a performance gain of 
roughly 0.4 dB over the SED(12,12) and outperforms the 
mBM2(RD,ll) by about 0.1 dB. The conventional HDD and 
the GMD algorithms have modest performance since they 
use only one or a few decoding attempts. Compared to the 
conventional HDD, the proposed algorithm mBM-2(RDE,l 1) 
gives approximately a 0.9 dB gain. It also outperforms the 
Koetter-Vardy (KV) algorithm |13| with infinite multiplicity 
The performance of mBM-2(RDE, 1 1 ) is roughly the 
same as the performance of mASD-3(RDE,ll). This implies 
that, for this setup, algorithms based on multiple trials of BM 
decoding perform as good as algorithms based on running 
the more complicated ASD decoding multiple times. In Fig. 
[2] we simulate the performance mBM-l(RDE,l 1) for the 
same RS code over an m-SC channel. One curve reflects 
the simulated frame-error rate (FER) and the other is the 
approximation derived from 2^^ where F is given in (jsj 
with 7?= 11. 

VI. CONCLUSION 

A RDE-based algorithm has been proposed for multi- 
ple decoding attempts of RS codes. The RDE analysis 
shows that this approach has several advantages. Firstly, 
the RDE approach achieves a near optimal performance- 
versus-complexity trade-off among algorithms that consider 
running a decoding scheme multiple times (see Remark [TJ. 
Secondly, it can help one estimate the error probability using 
exponentially tight bounds for large enough. Simulations 
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Figure 2. Performance of mBM-l(RDE,l 1) and its approximation 2 ^ 
wliere F is given in Jsj for tlie (255,239) RS code over an m-SC(p) channel. 

are used to confirm that algorithms using this approach 
achieve a better trade-off than previously known algorithms. 
Along with this, a numerical method is given to compute the 
required RDE function. 

Our future work focuses on extending this approach to 
analyze multiple decoding attempts for ISI channels. In this 
case, it makes sense for the decoder to consider multiple 
candidate error-events during decoding. Extending the RD 
approach directly gives a RD problem for Markov sources 
in the large distortion regime. Some work is required though 
because this is a well-known open problem. 
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